representation collapse
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- Europe > Italy > Tuscany > Florence (0.04)
- Asia > Singapore > Central Region > Singapore (0.04)
- (6 more...)
- Government (0.68)
- Information Technology > Security & Privacy (0.46)
- North America > Canada > Quebec > Montreal (0.14)
- Asia > Middle East > Jordan (0.04)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (0.92)
- Education > Educational Setting (0.45)
- Health & Medicine > Therapeutic Area > Neurology (0.45)
- North America > United States (0.14)
- Asia > Middle East > Jordan (0.04)
- North America > Canada (0.04)
- (5 more...)
- Information Technology > Artificial Intelligence > Natural Language (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.48)
- Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.46)
Elliptical Attention
Pairwise dot-product self-attention is key to the success of transformers that achieve state-of-the-art performance across a variety of applications in language and vision. This dot-product self-attention computes attention weights among the input tokens using Euclidean distance, which makes the model prone to representation collapse and vulnerable to contaminated samples. In this paper, we propose using a Mahalanobis distance metric for computing the attention weights to stretch the underlying feature space in directions of high contextual relevance. In particular, we define a hyper-ellipsoidal neighborhood around each query to increase the attention weights of the tokens lying in the contextually important directions.
On the Representation Collapse of Sparse Mixture of Experts
Sparse mixture of experts provides larger model capacity while requiring a constant computational overhead. It employs the routing mechanism to distribute input tokens to the best-matched experts according to their hidden representations. However, learning such a routing mechanism encourages token clustering around expert centroids, implying a trend toward representation collapse. In this work, we propose to estimate the routing scores between tokens and experts on a low-dimensional hypersphere. We conduct extensive experiments on cross-lingual language model pre-training and fine-tuning on downstream tasks. Experimental results across seven multilingual benchmarks show that our method achieves consistent gains. We also present a comprehensive analysis on the representation and routing behaviors of our models. Our method alleviates the representation collapse issue and achieves more consistent routing than the baseline mixture-of-experts methods.
Combating Bilateral Edge Noise for Robust Link Prediction
Although link prediction on graphs has achieved great success with the development of graph neural networks (GNNs), the potential robustness under the edge noise is still less investigated. To close this gap, we first conduct an empirical study to disclose that the edge noise bilaterally perturbs both input topology and target label, yielding severe performance degradation and representation collapse. To address this dilemma, we propose an information-theory-guided principle, Robust Graph Information Bottleneck (RGIB), to extract reliable supervision signals and avoid representation collapse. Different from the basic information bottleneck, RGIB further decouples and balances the mutual dependence among graph topology, target labels, and representation, building new learning objectives for robust representation against the bilateral noise. Two instantiations, RGIB-SSL and RGIB-REP, are explored to leverage the merits of different methodologies, i.e., self-supervised learning and data reparameterization, for implicit and explicit data denoising, respectively. Extensive experiments on six datasets and three GNNs with diverse noisy scenarios verify the effectiveness of our RGIB instantiations. The code is publicly available at: https://github.com/tmlr-group/RGIB.
TIP and Polish: Text-Image-Prototype Guided Multi-Modal Generation via Commonality-Discrepancy Modeling and Refinement
Ma, Zhiyong, Chen, Jiahao, Chuai, Qingyuan, Li, Zhengping
Multi-modal generation struggles to ensure thematic coherence and style consistency. Semantically, existing methods suffer from cross-modal mismatch and lack explicit modeling of commonality and discrepancy. Methods that rely on fine-grained training fail to balance semantic precision with writing style consistency. These shortcomings lead to suboptimal generation quality. To tackle these issues, we propose \textbf{\textit{TIPPo}}, a simple yet effective framework with explicit input modeling and comprehensive optimization objectives. It extracts the input text and images via multi-modal encoder and adapters, then measures the visual prototype. \textbf{T}extual, \textbf{I}mage, and \textbf{P}rototype signals are then fed to our proposed Dual Alignment Attention and Difference Operator modules before language model decoding. The proposed \textbf{Po}lishPPO reinforces the style consistency, while the unsupervised contrastive learning during SFT mitigates inter-sample representation collapse. Experimental results demonstrate the promising performance of \textbf{\textit{TIPPo}} in automatic evaluation and LLM-based criteria for creativity and semantic consistency.
- Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.14)
- Asia > China > Guangdong Province > Guangzhou (0.05)
- North America > Canada > Ontario > Toronto (0.04)
- (2 more...)
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- Europe > Italy > Tuscany > Florence (0.04)
- Asia > Singapore > Central Region > Singapore (0.04)
- (6 more...)
- Government (0.68)
- Information Technology > Security & Privacy (0.46)
- North America > Canada > Quebec > Montreal (0.14)
- Asia > Middle East > Jordan (0.04)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (0.92)
- Education > Educational Setting (0.45)
- Health & Medicine > Therapeutic Area > Neurology (0.45)